Collection of Internet

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Internet / Collection of Internet.iso / infosrvr / dev / www_talk.930 / 000537_dsr@hplb.hpl.hp.com _Mon Jan 11 16:26:35 1993.msg < prev next >

Wrap

Internet Message Format | 1994-01-24 | 15KB

Return-Path: <dsr@hplb.hpl.hp.com> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA14999; Mon, 11 Jan 93 16:26:35 MET Received: by dxmint.cern.ch (5.65/DEC-Ultrix/4.3) id AA28423; Mon, 11 Jan 1993 16:41:37 +0100 Received: from dragget.hpl.hp.com by hplb.hpl.hp.com; Mon, 11 Jan 93 15:38:22 GMT Received: by manuel.hpl.hp.com (16.6/15.6+ISC) id AA23888; Mon, 11 Jan 93 15:42:44 GMT From: Dave_Raggett <dsr@hplb.hpl.hp.com> Message-Id: <9301111542.AA23888@manuel.hpl.hp.com> Subject: Re HTTP2: caching and copyright To: www-talk@nxoc01.cern.ch Date: Mon, 11 Jan 93 15:42:41 GMT Cc: dsr@hplb.hpl.hp.com Mailer: Elm [revision: 66.25] These are comments on Tim's responses to my recent message on HTTP2. >> o the "Expires:" field is optional > agreed. >> o the date values should be in a prescribed format to simplify >> machine interpretation (Is this adequately defined by existing RFCs?) > agreed. yes it is, in RFC850 RFC977 provides a tighter definition for date/time restricting it to the time zone of the server or GMT. I would like us to restrict it to GMT period - as otherwise how can you in general find out the time zone of the server? NEWGROUPS date time [GMT] [<distribution>] The date is sent as 6 digits in the format YYMMDD, where YY is the last two digits of the year, MM is the two digits of the month, (with leading zero, if appropriate), and DD is the day of the month (with leading zero, if appropriate). The closest century is assumed as part of the year, i.e. 86 specifies 1986, 30 specifies 2030, 99 is 1999, 00 is 2000). Time must also be specified. It must be as 6 digits HHMMSS with HH being hours in the 24-hour clock, MM minutes 00-59, and SS seconds 00-59. The time is assumed to be in the server's time zone unless the token "GMT" appears, in which case both time and date are evaluated at the 0 meridian. RFC850 mentions that not all time zones have well known abbreviations, making it difficult to carry out date/time arithmetic. Furthermore, Kevin Hoadley's comment: This also depends on hosts agreeing on the date. To quote RFC1128, talking about a 1988 survey of the time/date on Internet hosts, "... a few had errors as much as two years" suggests we will have problems with servers in this regard. One solution would be for the server to send a field with each document: KeepFor: nnnn seconds | mmm days The cache management software then notes the date/time received and works out for itself the expiry date/time. This method has the great advantage of avoiding all need for date/time conversion, and any reliance on the server's having their clocks setup correctly. >> I think that we need to provide an operation in which the server returns a >> document only if it is later that a date/time supplied with the request. If >> it is the same (or earlier) the server should return a suitable status code >> and an optional "Cost:" header, see below. > Need to look at NNTP here. We end up getting very close indeed to it. > I would want the functionalty of this search to map onto the NEWNEWS > very nicely. A newsgroup is just a hypertext list anyway. I like the NEWNEWS command, but feel we should keep the GET & SINCE command. The latter allow you to refresh a cached document with one exchange whereas you need two when using NEWNEWS and a subsequent GET. The NEWNEWS command is intended for finding new basenotes and responses, and should be contrasted with the NEWGROUPS command. >> Note that servers shouln't cache documents with restricted readership since >> each server don't know the restrictions to apply. This requires a further >> header to identify such documents as being unsuitable for general caching: >> >> Distribution: restricted | unrestricted > Good point. Not the the distribution of other messages is in the form of > To: and Cc: and Newsgroup: and in fact Distribution:. (See > http://info.cern.ch/hypertext/WWW/Protocols/rfc850/rfc850.html#z12) > So you'll need a new fieldname. If we could only merge the functionality of > these systems in some cool way, it would be grand. I don't understand the description of "Distribution: nj.all" in RFC850 (section 2.2.8). It is unclear what its argument is. Is it a geographical hierarchy or is it some kind of newsgroup name with the "all" wildcard? It would be nice in some circumstances to define the readership groups for situations where a server could apply group membership information to restrict readership. This field would be supplied by the author. This idea is I believe in the same spirit as RFC850. Consider the following example: Distribution: incl.kbpd psl.all This says that the document can be given to anyone in psl and anyone in the kbpd subgroup of incl. You can make these names correspond to your organisation. The maintenance of these readership groups is outside the scope of the HTTP2 protocol. Local servers shouldn't cache documents including this header unless they "understand" the specified readership groups and can apply the same membership tests. This involves sharing the same definitions across a group of servers, for instance within a campus or a company. >> I would like the document header to include an optional cost header, e.g. >> >> Cost: 4.05 US DOLLARS >> Copyright: Reuters Inc. > I note here that both the copyright holder and the account for charging are > items in some address space, and we ought to be as flexible with these > address spaces as with the udi. So I would propose something like > > ChargeTo: HPInternal:/8126/148689 upto $2.00 > > would be better. But how does this fit in with authentication? Once you > are authenticated, your prefered method of paying will be known. You can't > have charging without authentication! Four points: First, it certainly isn't the case that once you are authenticated then the way of charging you is known. For example consider members of the public wishing to pay for information using a credit card number for a service they have never before accessed. In HP it may in some cases be sufficient to check that the client's internet address starts with the company's subnet code. However, the server still needs the employee name, number and location code to cross charge. Second, I don't think the "upto" concept is needed. In the vast majority of cases a fixed cost will suffice. A point to watch when keeping documents in local caches, is that this cost may change from time to time. This corresponds to pricing of normal goods, and I believe can be adequately handled by appropriately setting the "KeepFor" or "Expires" field. Third, for legal purposes it is still necesary to tag documents with who owns the copyright, as in books, music and other products. For this reason we should include the "Copyright:". Fourth, I like the idea of a universal scheme for naming the copyright owner and charging method, but feel that this will take some time to take effect. For the moment I would like to stick to the following: Cost: 4.05 US DOLLARS Copyright: 1988 Time International Inc. ChargeTo: HP/8126/148689 Where the meaning of the "CopyRight:" and "ChargeTo:" fields is outside the scope of the HTTP2 protocol specification. The "Cost:" field always starts with the amount and should be followed by the currency name. The requesting GET could include an optional header: CostingUpTo: 2.50 US DOLLARS This would result in the server returning an error message if this was less than the cost of the requested document, and introduces issues of how to recognising currency type and performing currency conversion. Users should be able to see how much they will have to pay on preceding hypertext pages (as supplied by the server). > A simple thing in the first instance is to say that it illegal to cache > a for-pay document unless you have a privat earrangement with the owner > about refunding him. This could be done using a completely separate billing > process. No. You can only get copies of documents for which the server recognises that there is an effective arrangement for making the payment. However, if this is the case, then caching presents no problems, provided the authetication and ChargeTo information is preserved and supplied to the server with the GOT command. I will try and lay my hands on a copy of "Litterary Machines". >> The protocol ought to allow for multiple GOT statements (and associated >> headers in the same message. For this it seems simple enough to require a >> terminating blank line. > Hey, that;s not something you do for one method, it's a change to the whole > protocol to introduce pipelining. Oh dear! It seems a waste to have to set up a connection for each such request. Perhaps the safest thing is to allow multiple Udi's with the GOT command, all of which must be for the same client. What limits are there on line length for headers or is there a mechanism for continuing arguments on subsequent lines? This would still be effective in limiting network traffic, and processing time. >> Effective support for discussion groups >> My model is that discussion groups each have unique Udi's. Each discussion >> group has a sequence of base notes, and each base note is associated with a >> sequence of responses. I am unsure of how to deal with cross postings! > I agree that the POST method is well defined as a method of the > newsgroup class which takes an article as a parameter. In fact, as you say, > cross-posting makes a mess of this, as it involved many groups in one atomic > operation. This is a peculiarity of news which makes it difficult to map > onto the object model. Any ideas? The NNTP protocol employs the POST command to post an article, and relies on the document's header to specify the news groups for posting to with the Newsgroups header. The "References:" header is used to link a response to any articles prompting submission of this article. Thus each article can be posted to multiple groups, and can have zero or more references to preceding articles. For convergence between NNTP and HTTP2 we need to clarify the mapping of groups and references. News groups as currently defined are hierarchical name spaces without reference to a server or filing system. The WWW model currently ties documents to both of these. I would like to be able to post responses as WWW documents which refer to one (or more?) existing documents. We can already do this. What we can't do is to find what documents reference a given document. This is hard in principle and practice since the various documents can be on different machines scattered over the entire world. The answer is to provide a mechanism which allows servers to track which Udi's should be recorded as being "responses" to other Udi's. What is the analogous concept to news group? These are lists of articles, and not articles as such. The GROUP command in NNTP allows you to identify articles in given groups. The LIST command returns the complete list of groups known to the server, while the NEWGROUPS command returns groups created since a specified date/time, and matching on specified distribution categories. I think that in WWW we should treat groups as named documents which are generated by the server from the database of postings stored under that name. The important thing is to distinguish between the Udi's of references and those of news groups. Given these ideas I will now present my suggestions for the POST command: The document header supplied with the POST command has the following fields: Newsgroups: <followed by one or more Udi's> /* optional */ References: <followed by one or more Udi's> /* optional */ Followed by one of the following: DocumentName: Udi /* for an existing document - body is void */ NewDocument: /* for new document, contents follow as body */ The semantics are the same as for NNTP, except that the Newsgroups header is optional. In otherwords you can post responses to any WWW document - it doesn't need to be in a news group. The server should return the Udi of the document if successful (note that the NNTP POST command doesn't bother with this). We can include support for ARTICLE, BODY, HEAD, LIST and NEWGROUPS commands in a way very similar to NNTP. The GROUP command in NNTP returns the first and last article number in the group. This is unlikely to be what we want - as it depends of the special naming scheme used for network news articles. We are probably more interested in getting a list of article names in the group. In my earlier message I suggested that this could best be achieved using the GET command in conjunction with SINCE and BEFORE parameters (to allow for really humungous groups with thousands of base notes). The server is responsible for interpreting this command with the appropriate database query. A really useful command missing from NNTP is the ability to list the responses to a given document, i.e. the command names a given document/article, and is returned with the list of Udi's for documents which were posted with that article as part of their references. It would be great if this list was sent by the server along with the base document, as a separate part in a multipart message. Finally, I want to draw attention to post-it style annotations. It would be really nice to be able to post a note at a particular point within a given document. The browser would show such annotations as little post-it symbols which you click to see their contents. This requires similar mechanisms to that for discussion groups. Perhaps we could have an ANNOTATE command: ANNOTATE Udi /* including an anchor for positioning the annotation */ (body follows) Authors place anchors in documents to suit their own needs and not the unforeseen needs of others. It is therefore necessary to general the anchor syntax in document Udi's to support a more flexible scheme based on pattern matching. Servers should send the document along with the list of annotations. Comments please. Best wishes and sorry for such a long response, Dave Raggett